Task A simple yet powerful marketing technique is an analysis utilizing recency (how recent was the customer's last purchase), frequency (how often did the customer make a purchase in a given period) and monetary (how much money did the customer spend in a given period) data to identify the best customers and perform targeted marketing campaigns.
As data scientist you are asked to segment the customers using transaction data and profile them based-on their characteristics (recency, frequency, monetary). After you find the segments, name them using understandable words so marketing team can easily create campaign strategies.
Data: ../data/transactions.csv
Hints: For each customer id, generate time difference between their last transaction and today. You should also calculate number of transaction and total amount of spending. You are allowed to use SQL.
Output: Push the executed notebook into your github repo and submit the URL to ketua kelas no later than August 21, 2021. Note that notebook must contain explanatory analysis and clustering as well as story about your findings.
Goodluck!
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly as py
import plotly.graph_objs as go
import plotly.express as px
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score, homogeneity_score
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_blobs
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
from scipy.cluster.hierarchy import dendrogram, linkage
import warnings
warnings.filterwarnings("ignore")
import os
transaction_df = pd.read_csv(r'data/transactions.csv')
transaction_df.head()
| customer_id | trans_date | trans_amount | |
|---|---|---|---|
| 0 | CS5295 | 11-Feb-18 | 35 |
| 1 | CS4768 | 15-Mar-20 | 39 |
| 2 | CS2122 | 26-Feb-18 | 52 |
| 3 | CS1217 | 16-Nov-16 | 99 |
| 4 | CS1850 | 20-Nov-18 | 78 |
#Check column data types
transaction_df.dtypes
customer_id object trans_date object trans_amount int64 dtype: object
#We want to change trans_date type to datetime format
pd.to_datetime(transaction_df['trans_date'])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 654 try: --> 655 ret = self._build_naive(res, default) 656 except ValueError as e: c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default) 1240 -> 1241 naive = default.replace(**repl) 1242 ValueError: day is out of range for month The above exception was the direct cause of the following exception: ParserError Traceback (most recent call last) pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime() pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string() c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs) 1373 else: -> 1374 return DEFAULTPARSER.parse(timestr, **kwargs) 1375 c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 656 except ValueError as e: --> 657 six.raise_from(ParserError(e.args[0] + ": %s", timestr), e) 658 c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\six.py in raise_from(value, from_value) ParserError: day is out of range for month: 29-Feb-17 During handling of the above exception, another exception occurred: TypeError Traceback (most recent call last) pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime() TypeError: invalid string coercion to datetime During handling of the above exception, another exception occurred: ValueError Traceback (most recent call last) c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 654 try: --> 655 ret = self._build_naive(res, default) 656 except ValueError as e: c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in _build_naive(self, res, default) 1240 -> 1241 naive = default.replace(**repl) 1242 ValueError: day is out of range for month The above exception was the direct cause of the following exception: ParserError Traceback (most recent call last) <ipython-input-4-765e841a055a> in <module> 1 #We want to change trans_date type to datetime format ----> 2 pd.to_datetime(transaction_df['trans_date']) c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, format, exact, unit, infer_datetime_format, origin, cache) 803 result = arg.map(cache_array) 804 else: --> 805 values = convert_listlike(arg._values, format) 806 result = arg._constructor(values, index=arg.index, name=arg.name) 807 elif isinstance(arg, (ABCDataFrame, abc.MutableMapping)): c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact) 463 assert format is None or infer_datetime_format 464 utc = tz == "utc" --> 465 result, tz_parsed = objects_to_datetime64ns( 466 arg, 467 dayfirst=dayfirst, c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object) 2088 return values.view("i8"), tz_parsed 2089 except (ValueError, TypeError): -> 2090 raise e 2091 2092 if tz_parsed is not None: c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object) 2073 2074 try: -> 2075 result, tz_parsed = tslib.array_to_datetime( 2076 data, 2077 errors=errors, pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime() pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime() pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object() pandas\_libs\tslib.pyx in pandas._libs.tslib.array_to_datetime_object() pandas\_libs\tslibs\parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string() c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(timestr, parserinfo, **kwargs) 1372 return parser(parserinfo).parse(timestr, **kwargs) 1373 else: -> 1374 return DEFAULTPARSER.parse(timestr, **kwargs) 1375 1376 c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\dateutil\parser\_parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs) 655 ret = self._build_naive(res, default) 656 except ValueError as e: --> 657 six.raise_from(ParserError(e.args[0] + ": %s", timestr), e) 658 659 if not ignoretz: c:\users\940368\appdata\local\programs\python\python39\lib\site-packages\six.py in raise_from(value, from_value) ParserError: day is out of range for month: 29-Feb-17
transaction_df[transaction_df['trans_date']=='28-Feb-17'].count()
customer_id 82 trans_date 82 trans_amount 82 dtype: int64
transaction_df[transaction_df['trans_date']=='29-Feb-17'].count()
customer_id 90 trans_date 90 trans_amount 90 dtype: int64
transaction_df[transaction_df['trans_date']=='01-Mar-17'].count()
customer_id 91 trans_date 91 trans_amount 91 dtype: int64
transaction_df.groupby('trans_date').count().min()
customer_id 55 trans_amount 55 dtype: int64
transaction_df.groupby('trans_date').count().max()
customer_id 141 trans_amount 141 dtype: int64
From the output we can see we have invalid data (29-Feb-17) in trans_date column, so we will drop the row that contain trans_date 29-Feb-17. We choose to drop the data because we can see maximum number of row for each date is 141 and minimum row is 55, so if we assign 29-Feb-17 to 01-Mar-17 or 28-Feb-17, it will change the pattern in our data.
# Data Cleansing
transaction_df_2 = transaction_df.drop(transaction_df[transaction_df['trans_date']=='29-Feb-17'].index)
transaction_df_2
| customer_id | trans_date | trans_amount | |
|---|---|---|---|
| 0 | CS5295 | 11-Feb-18 | 35 |
| 1 | CS4768 | 15-Mar-20 | 39 |
| 2 | CS2122 | 26-Feb-18 | 52 |
| 3 | CS1217 | 16-Nov-16 | 99 |
| 4 | CS1850 | 20-Nov-18 | 78 |
| ... | ... | ... | ... |
| 124995 | CS8433 | 26-Jun-16 | 64 |
| 124996 | CS7232 | 19-Aug-19 | 38 |
| 124997 | CS8731 | 28-Nov-19 | 42 |
| 124998 | CS8133 | 14-Dec-18 | 13 |
| 124999 | CS7996 | 13-Dec-19 | 36 |
124910 rows × 3 columns
#We want to change trans_date type to datetime format
transaction_df_2['trans_date']=pd.to_datetime(transaction_df_2['trans_date'])
#Check column data types
transaction_df_2.dtypes
customer_id object trans_date datetime64[ns] trans_amount int64 dtype: object
transaction_df_2
| customer_id | trans_date | trans_amount | |
|---|---|---|---|
| 0 | CS5295 | 2018-02-11 | 35 |
| 1 | CS4768 | 2020-03-15 | 39 |
| 2 | CS2122 | 2018-02-26 | 52 |
| 3 | CS1217 | 2016-11-16 | 99 |
| 4 | CS1850 | 2018-11-20 | 78 |
| ... | ... | ... | ... |
| 124995 | CS8433 | 2016-06-26 | 64 |
| 124996 | CS7232 | 2019-08-19 | 38 |
| 124997 | CS8731 | 2019-11-28 | 42 |
| 124998 | CS8133 | 2018-12-14 | 13 |
| 124999 | CS7996 | 2019-12-13 | 36 |
124910 rows × 3 columns
#Check Duplicate data
transaction_df_2.duplicated().sum()
6
#Drop duplicate data
transaction_df_2=transaction_df_2.drop_duplicates()
#Check Duplicate data
transaction_df_2.duplicated().sum()
0
#Check Null Value
transaction_df_2.isnull().values.any()
False
today = pd.Timestamp.today()
Days_diff = (today - transaction_df_2['trans_date']).dt.days
transaction_df_2['Recency']=Days_diff
recency = transaction_df_2.groupby('customer_id').min('Recency')
recency.drop(columns='trans_amount',axis=1,inplace=True)
recency
| Recency | |
|---|---|
| customer_id | |
| CS1112 | 577 |
| CS1113 | 551 |
| CS1114 | 548 |
| CS1115 | 526 |
| CS1116 | 719 |
| ... | ... |
| CS8996 | 613 |
| CS8997 | 777 |
| CS8998 | 600 |
| CS8999 | 773 |
| CS9000 | 532 |
6889 rows × 1 columns
frequency=transaction_df_2.groupby('customer_id').count()
frequency['frequency']=transaction_df_2.groupby('customer_id').count()['trans_date']
frequency = frequency[['frequency']]
frequency
| frequency | |
|---|---|
| customer_id | |
| CS1112 | 15 |
| CS1113 | 20 |
| CS1114 | 19 |
| CS1115 | 22 |
| CS1116 | 13 |
| ... | ... |
| CS8996 | 13 |
| CS8997 | 14 |
| CS8998 | 13 |
| CS8999 | 12 |
| CS9000 | 13 |
6889 rows × 1 columns
monetary = transaction_df_2.groupby('customer_id').sum('trans_amount')
monetary.drop(columns='Recency',axis=1,inplace=True)
monetary
| trans_amount | |
|---|---|
| customer_id | |
| CS1112 | 1012 |
| CS1113 | 1490 |
| CS1114 | 1432 |
| CS1115 | 1659 |
| CS1116 | 857 |
| ... | ... |
| CS8996 | 582 |
| CS8997 | 543 |
| CS8998 | 624 |
| CS8999 | 383 |
| CS9000 | 533 |
6889 rows × 1 columns
#Join recency and frequency
data = recency.join(frequency)
#Join data and monetary
data = data.join(monetary)
data
| Recency | frequency | trans_amount | |
|---|---|---|---|
| customer_id | |||
| CS1112 | 577 | 15 | 1012 |
| CS1113 | 551 | 20 | 1490 |
| CS1114 | 548 | 19 | 1432 |
| CS1115 | 526 | 22 | 1659 |
| CS1116 | 719 | 13 | 857 |
| ... | ... | ... | ... |
| CS8996 | 613 | 13 | 582 |
| CS8997 | 777 | 14 | 543 |
| CS8998 | 600 | 13 | 624 |
| CS8999 | 773 | 12 | 383 |
| CS9000 | 532 | 13 | 533 |
6889 rows × 3 columns
data.shape
(6889, 3)
data.describe()
| Recency | frequency | trans_amount | |
|---|---|---|---|
| count | 6889.000000 | 6889.000000 | 6889.000000 |
| mean | 596.349252 | 18.130933 | 1178.359123 |
| std | 85.545443 | 5.192154 | 465.606743 |
| min | 515.000000 | 4.000000 | 149.000000 |
| 25% | 538.000000 | 14.000000 | 781.000000 |
| 50% | 569.000000 | 18.000000 | 1227.000000 |
| 75% | 627.000000 | 22.000000 | 1520.000000 |
| max | 1373.000000 | 39.000000 | 2933.000000 |
# Check inter varibale correlation
sns.pairplot(data, aspect=1.5)
plt.show()
From the plot we can see linear correlation of frequency and trans_amount variable
standardize = StandardScaler()
data_std = pd.DataFrame(standardize.fit_transform(data))
data_std.columns = data.columns
data_std
| Recency | frequency | trans_amount | |
|---|---|---|---|
| 0 | -0.226203 | -0.603056 | -0.357321 |
| 1 | -0.530157 | 0.360005 | 0.669371 |
| 2 | -0.565229 | 0.167393 | 0.544793 |
| 3 | -0.822421 | 0.745230 | 1.032364 |
| 4 | 1.433854 | -0.988281 | -0.690244 |
| ... | ... | ... | ... |
| 6884 | 0.194656 | -0.988281 | -1.280914 |
| 6885 | 2.111905 | -0.795668 | -1.364682 |
| 6886 | 0.042679 | -0.988281 | -1.190703 |
| 6887 | 2.065143 | -1.180893 | -1.708345 |
| 6888 | -0.752278 | -0.988281 | -1.386161 |
6889 rows × 3 columns
First, we will try to segment the data based on 2 variables: Recency and frequency
data_is = data_std[['Recency' , 'frequency']].values
data_is
array([[-0.22620325, -0.60305619],
[-0.53015733, 0.36000517],
[-0.56522896, 0.1673929 ],
...,
[ 0.04267922, -0.98828073],
[ 2.06514296, -1.180893 ],
[-0.75227763, -0.98828073]])
k = 3
k_means = KMeans(init = "k-means++", n_clusters = k, random_state= 123)
k_means.fit(data_is)
centroids = k_means.cluster_centers_
centroids
array([[-0.28626929, -0.69791183],
[-0.3431659 , 0.88239681],
[ 1.9265558 , -0.73311912]])
Segment = k_means.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(data['Recency'], data['frequency'], hue=Segment,
palette=sns.color_palette('hls', k))
plt.title('KMeans with {} Clusters'.format(k))
plt.show()
k_means.cluster_centers_ outputs the centroid based on standardized-data. We can easily calculate the centroid using original data
centroids = data[['Recency','frequency']].groupby(Segment, as_index=False).mean()
centroids
| Recency | frequency | |
|---|---|---|
| 0 | 571.861996 | 14.507531 |
| 1 | 566.995104 | 22.712141 |
| 2 | 761.145361 | 14.324742 |
def k_means_inertia_silhouette(data, k_max = 9, init = 'k-means++'):
inertia = []
silhouette = []
for k in range(2, k_max):
km = KMeans(init = init, n_clusters = k, random_state= 123)
km.fit(data)
inertia.append(km.inertia_)
silhouette.append(silhouette_score(data, km.labels_))
plt.figure(1 , figsize = (15 ,6))
plt.plot(np.arange(2 , k_max) , inertia , 'o')
plt.plot(np.arange(2 , k_max) , inertia , '-' , alpha = 0.5)
plt.xlabel('Number of Clusters') , plt.ylabel('Inertia')
plt.title('The Elbow method using Inertia for each number of cluster')
plt.show()
plt.figure(1 , figsize = (15 ,6))
plt.plot(np.arange(2 , k_max) , silhouette , 'o')
plt.plot(np.arange(2 , k_max) , silhouette , '-' , alpha = 0.5)
plt.xlabel('Number of Clusters') , plt.ylabel('Silhouette')
plt.title('Silhouette score for each number of cluster')
plt.show()
return inertia, silhouette
inertia, silhouette = k_means_inertia_silhouette(data = data_is)
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(k_means, k=(2,10))
visualizer.fit(data_std) # Fit the data to the visualizer
visualizer.poof() # Draw/show/poof the data
<AxesSubplot:title={'center':'Distortion Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='distortion score'>
k = 4
k_means = KMeans(init = "k-means++", n_clusters = k, random_state= 123)
k_means.fit(data_is)
centroids = k_means.cluster_centers_
centroids
array([[-0.41825981, 0.91510632],
[ 0.94502634, -0.37751625],
[ 3.13182193, -1.1681332 ],
[-0.43757019, -0.72787216]])
Segment = k_means.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(data['Recency'], data['frequency'], hue=Segment,
palette=sns.color_palette('hls', k))
plt.title('KMeans with {} Clusters'.format(k))
plt.show()
centroids = data[['Recency','frequency']].groupby(Segment, as_index=False).mean()
centroids
| Recency | frequency | |
|---|---|---|
| 0 | 560.571629 | 22.881961 |
| 1 | 677.238095 | 16.166289 |
| 2 | 864.617089 | 12.072785 |
| 3 | 558.919833 | 14.351983 |
inertia, silhouette = k_means_inertia_silhouette(data_std)
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(2,10))
visualizer.fit(data_std) # Fit the data to the visualizer
visualizer.poof() # Draw/show/poof the data
<AxesSubplot:title={'center':'Distortion Score Elbow for KMeans Clustering'}, xlabel='k', ylabel='distortion score'>
k = 4
k_means = KMeans(init = "k-means++", n_clusters = k, random_state= 123)
k_means.fit(data_std)
KMeans(n_clusters=4, random_state=123)
Segment = k_means.labels_
data_std.groupby(Segment, as_index=False).mean()
| Recency | frequency | trans_amount | |
|---|---|---|---|
| 0 | -0.191317 | -1.038916 | -1.148154 |
| 1 | -0.351725 | 1.270868 | 1.222335 |
| 2 | 2.331673 | -0.951922 | -0.915636 |
| 3 | -0.198772 | 0.085647 | 0.178141 |
data['Segment'] = list(map(str, Segment))
fig = px.scatter_3d(data, x='Recency', y='frequency', z='trans_amount', color='Segment',
opacity = 0.8, height=800)
fig.show() # to show as static image: fig.show('svg')
data_std_2 = data_std.copy()
data_std_2['cluster']= k_means.labels_
melted_RFM=pd.melt(data_std_2.reset_index(),
id_vars=['cluster'],
value_vars=['Recency','frequency','trans_amount'],
var_name='Features',
value_name='Value'
)
sns.lineplot('Features','Value',hue='cluster',data=melted_RFM)
plt.legend()
<matplotlib.legend.Legend at 0x1e04ee72280>
agglo_cluster = AgglomerativeClustering(n_clusters = 4, affinity = 'euclidean', linkage = 'complete')
Segment = agglo_cluster.fit_predict(data_std)
centroids = data.groupby(Segment, as_index=False).mean()
centroids
| Recency | frequency | trans_amount | |
|---|---|---|---|
| 0 | 581.030843 | 18.262321 | 1191.823688 |
| 1 | 809.322581 | 12.135945 | 653.391705 |
| 2 | 580.445161 | 30.251613 | 2152.870968 |
| 3 | 1235.100000 | 7.800000 | 387.800000 |
data_std_3 = data_std.copy()
data_std_3['cluster']=Segment
data['Segment'] = list(map(str, Segment))
fig = px.scatter_3d(data, x='Recency', y='frequency', z='trans_amount', color='Segment',
opacity = 0.8, height=800)
fig.show()
melted_RFM=pd.melt(data_std_3.reset_index(),
id_vars=['cluster'],
value_vars=['Recency','frequency','trans_amount'],
var_name='Features',
value_name='Value'
)
sns.lineplot('Features','Value',hue='cluster',data=melted_RFM)
plt.legend()
<matplotlib.legend.Legend at 0x1e04f02b340>
S=data_std.to_numpy()
S
array([[-0.22620325, -0.60305619, -0.35732128],
[-0.53015733, 0.36000517, 0.66937067],
[-0.56522896, 0.1673929 , 0.54479299],
...,
[ 0.04267922, -0.98828073, -1.19070304],
[ 2.06514296, -1.180893 , -1.7083448 ],
[-0.75227763, -0.98828073, -1.38616113]])
# Try to make estimated number of clusters = 4
db = DBSCAN(eps=0.2, min_samples=14).fit(S)
# Number of clusters in labels, ignoring noise if present.
# the label -1 is considered as noise by the DBSCAN algorithm
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print('Estimated number of clusters: %d' % n_clusters_)
print('Estimated number of noise points: %d' % n_noise_)
Estimated number of clusters: 4 Estimated number of noise points: 1484
Segmentdb = db.labels_
data['Segmentdb'] = list(map(str, Segmentdb))
fig = px.scatter_3d(data, x='Recency', y='frequency', z='trans_amount', color='Segmentdb',
opacity = 0.8, height=800)
fig.show() # to show as static image: fig.show('svg')
When we make 4 cluster with DBSCAN, we will get a lot of outlier
# Hierarchical
unique, counts = np.unique(Segment, return_counts=True)
dict(zip(unique, counts))
{0: 6290, 1: 434, 2: 155, 3: 10}
# DbScan
unique, counts = np.unique(db.labels_, return_counts=True)
dict(zip(unique, counts))
{-1: 1484, 0: 5370, 1: 10, 2: 17, 3: 8}
# K-means
unique, counts = np.unique(k_means.labels_, return_counts=True)
dict(zip(unique, counts))
{0: 1765, 1: 1738, 2: 641, 3: 2745}
K-Means seems better than Hierarchical and DBSCAN when using this dataset.
So we choose to use K-Means for our Customer Segmentation
data.drop(columns='Segment',axis=1,inplace=True)
data.drop(columns='Segmentdb',axis=1,inplace=True)
# We will assign k-means label to our dataframe
cluster = k_means.labels_
# Change label 1 with loyal, 3 with Medium loyal, 0 with Low Loyal / New Customer, 2 with Not Loyal
cluster = np.where(cluster==1, 'Very loyal', cluster)
cluster = np.where(cluster=='3', 'Medium loyal', cluster)
cluster = np.where(cluster=='0', 'Low loyal / New Customer', cluster)
cluster = np.where(cluster=='2', 'Not Loyal', cluster)
data['Customer Segment'] = cluster
data
| Recency | frequency | trans_amount | Customer Segment | |
|---|---|---|---|---|
| customer_id | ||||
| CS1112 | 577 | 15 | 1012 | Medium loyal |
| CS1113 | 551 | 20 | 1490 | Medium loyal |
| CS1114 | 548 | 19 | 1432 | Medium loyal |
| CS1115 | 526 | 22 | 1659 | Very loyal |
| CS1116 | 719 | 13 | 857 | Not Loyal |
| ... | ... | ... | ... | ... |
| CS8996 | 613 | 13 | 582 | Low loyal / New Customer |
| CS8997 | 777 | 14 | 543 | Not Loyal |
| CS8998 | 600 | 13 | 624 | Low loyal / New Customer |
| CS8999 | 773 | 12 | 383 | Not Loyal |
| CS9000 | 532 | 13 | 533 | Low loyal / New Customer |
6889 rows × 4 columns
!jupyter nbconvert --to html Clustering_RFM.ipynb
[NbConvertApp] Converting notebook Clustering_RFM.ipynb to html [NbConvertApp] Writing 5349136 bytes to Clustering_RFM.html